71 research outputs found

    Handling of current time in native XML databases

    Get PDF
    The introduction of Native XML databases opens many research questions related to the data models used to represent and manipulate data, including temporal data in XML. Increasing use of XML for Valid Web pages warrants an adequate treatment of now in Native XML databases. In this study, we examined how to represent and manipulate now-relative temporal data. We identify different approaches being used to represent current time in XML temporal databases, and introduce the notion of storing variables such as `now' or `UC' as strings in XML native databases. All approaches are empirically evaluated on a query that time-slices the timeline at the current time. The experimental results indicate that the proposed extension offers several advantages over other approaches: better semantics, less storage space and better response time

    Feature-based multiple models improve classification of mutation-induced stability changes

    Get PDF
    BACKGROUND: Reliable prediction of stability changes in protein variants is an important aspect of computational protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site. RESULTS: We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work. CONCLUSIONS: EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure

    Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins

    Get PDF
    BACKGROUND: Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set. RESULTS: We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51. CONCLUSIONS: Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users

    The Challenges and Issues Facing the Deployment of RFID Technology

    Get PDF
    Griffith Sciences, School of Information and Communication TechnologyFull Tex

    Querying now-relative data

    Get PDF

    Voxel-based extraction of individual pylons and wires from lidar point cloud data

    Get PDF
    Extraction of individual pylons and wires is important for modelling of 3D objects in a power line corridor (PLC) map. However, the existing methods mostly classify points into distinct classes like pylons and wires, but hardly into individual pylons or wires. The proposed method extracts standalone pylons, vegetation and wires from LiDAR data. The extraction of individual objects is needed for a detailed PLC mapping. The proposed approach starts off with the separation of ground and non ground points. The non-ground points are then classified into vertical (e.g., pylons and vegetation) and non-vertical (e.g., wires) object points using the vertical profile feature (VPF) through the binary support vector machine (SVM) classifier. Individual pylons and vegetation are then separated using their shape and area properties. The locations of pylons are further used to extract the span points between two successive pylons. Finally, span points are voxelised and alignment properties of wires in the voxel grid is used to extract individual wires points. The results are evaluated on dataset which has multiple spans with bundled wires in each span. The evaluation results show that the proposed method and features are very effective for extraction of individual wires, pylons and vegetation with 99% correctness and 98% completeness

    Indexing RFID data using the VG-curve

    Get PDF
    Abstract Existing methods for the management of multidimensional data typically do not scale well with an increased number of dimensions or require the unsupported augmentation of the kernel. However, the use of multidimensional data continues to grow in modern database applications, specifically in spatio-temporal databases. These systems produce vast volumes of multidimensional data, and as such, data is stored in commercial RDBMS. Therefore, the efficient management of such multidimensional data is crucial. Despite it being applicable to any multidimensional vector data, we consider Radio Frequency Identifications (RFID) systems in this work. Due to RFID's acceptance and rapid growth into new and complex applications, together with the fact that, as with commercial applications, its data is stored within commercial RDBMS, we have chosen RFID as a pertinent testbed. We show that its data can be represented as vectors in multidimensional space and that the VG-curve combined with Multidimensional Dynamic Clustering Primary Index, which can be integrated into commercial RDBMS, can be used to efficiently access such data. In an empirical study conducted on three, five and nine dimensional RFID data we show that the presented concept outperforms available off-the-shelf options with a fraction of the required space

    Probabilistic Schema Covering

    Get PDF
    Schema covering is the process of representing large and complex schemas by easily comprehensible common objects. This task is done by identifying a set of common concepts from a repository called concept repository and generating a cover to describe the schema by the concepts. Traditional schema covering approach has two shortcomings: it does not model the uncertainty in the covering process, and it requires user to state an ambiguity constraint which is hard to define. We remedy this problem by incorporating probabilistic model into schema covering to generate probabilistic schema cover. The integrated probabilities not only enhance the coverage of cover results but also eliminate the need of defining the ambiguity parameter. Both probabilistic schema covering and traditional schema covering run on top of a concept repository. Experiments on real-datasets show the competitive performance of our approach
    • …
    corecore